Skip to content

Conversation

@jimczi
Copy link
Contributor

@jimczi jimczi commented Mar 14, 2025

Backports the following commits to 8.x:

This refactor improves memory efficiency by processing inference requests in batches, capped by a max input length.

Changes include:
- A new dynamic operator setting to control the maximum batch size in bytes.
- Dropping input data from inference responses when the legacy semantic text format isn’t used, saving memory.
- Clearing inference results dynamically after each bulk item to free up memory sooner.

This is a step toward enabling circuit breakers to better handle memory usage when dealing with large inputs.
@jimczi jimczi added :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Inference Label for the Search Inference team :SearchOrg/Relevance Label for the Search (solution/org) Relevance team >enhancement auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport labels Mar 14, 2025
@elasticsearchmachine elasticsearchmachine merged commit 17e2721 into elastic:8.x Mar 14, 2025
15 checks passed
@jimczi jimczi deleted the backport/8.x/pr-124313 branch March 14, 2025 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport >enhancement :Search Relevance/Search Catch all for Search Relevance :SearchOrg/Inference Label for the Search Inference team :SearchOrg/Relevance Label for the Search (solution/org) Relevance team v8.19.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants